Add LiteRT-LM engine support for Android (.litertlm models) by DenisovAV · Pull Request #176 · DenisovAV/flutter_gemma

DenisovAV · 2026-01-23T17:10:15Z

Summary

Add LiteRT-LM SDK integration for Android platform (v0.9.0-alpha01)
Implement Strategy Pattern for engine abstraction (InferenceEngine / InferenceSession)
Support both .task (MediaPipe) and .litertlm (LiteRT-LM) model formats
Add multimodal support (text + image) for LiteRT-LM models
Add Gemma 3 Nano 2B and 4B LiteRT-LM model options in example app
Refactor PreferredBackend enum: add NPU, remove unsupported SDK values

Architecture

InferenceEngine (interface)
├── MediaPipeEngine (.task files)
└── LiteRtLmEngine (.litertlm files)

InferenceSession (interface)  
├── MediaPipeSession
└── LiteRtLmSession

EngineFactory.createFromModelPath() automatically selects the correct engine based on file extension.

PreferredBackend Changes

Before	After
unknown	❌ removed
cpu	✅ cpu
gpu	✅ gpu
gpuFloat16	❌ removed (not in SDK)
gpuMixed	❌ removed (not in SDK)
gpuFull	❌ removed (not in SDK)
tpu	❌ removed (not in SDK)
—	✅ npu (LiteRT-LM only)

NPU Support:

LiteRT-LM: Full NPU support (Google Tensor, Qualcomm)
MediaPipe: NPU not supported (fallback to default)

Key Files

engines/InferenceEngine.kt - Engine abstraction
engines/InferenceSession.kt - Session abstraction
engines/EngineFactory.kt - Factory with auto-detection
engines/litertlm/LiteRtLmEngine.kt - LiteRT-LM implementation
engines/litertlm/LiteRtLmSession.kt - LiteRT-LM session with chunk buffering
engines/mediapipe/MediaPipeEngine.kt - MediaPipe wrapper
pigeon.dart - PreferredBackend enum definition

Known Issues (for future PRs)

Race condition in session access (documented in code review)
No cancellation support in LiteRT-LM SDK 0.9.x
Token counting is estimated (~4 chars/token)

Implement Strategy pattern for inference engines with two backends: - MediaPipe (existing .task files) - LiteRT-LM (new .litertlm files with multimodal support) Key changes: - Add InferenceEngine interface with Engine/Session abstractions - Add EngineFactory for automatic engine selection based on file extension - Implement LiteRtLmEngine with visionBackend for multimodal models - Implement LiteRtLmSession with chunk buffering for MediaPipe compatibility - Add thread-safety (synchronized locks) in FlutterGemmaPlugin - Add LiteRT-LM SDK dependency (0.9.0-alpha01) - Add gemma3n LiteRT-LM model options in example app - Add unit tests for engines Tested with Gemma 3 Nano E2B multimodal (text + image) on Pixel 8.

Copilot

Pull request overview

This pull request adds support for LiteRT-LM models (.litertlm files) to the Flutter Gemma plugin by introducing a Strategy Pattern-based engine abstraction layer. The PR refactors the existing MediaPipe inference code into adapters and adds a new LiteRT-LM engine implementation alongside it.

Changes:

Introduces InferenceEngine and InferenceSession abstractions with MediaPipe and LiteRT-LM implementations
Adds EngineFactory for automatic engine selection based on model file extension
Updates FlutterGemmaPlugin to use the new abstraction layer with improved thread safety
Adds two new Gemma 3 Nano model variants (2B and 4B) using LiteRT-LM format in the example app

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/InferenceEngine.kt	Core engine abstraction interface defining initialization, session creation, and capabilities
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/InferenceSession.kt	Session abstraction interface for text/image input and response generation
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/EngineConfig.kt	Configuration data classes and SharedFlow factory for both engines
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/EngineFactory.kt	Factory for automatic engine selection based on file extension
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/mediapipe/MediaPipeEngine.kt	Adapter wrapping existing MediaPipe LlmInference implementation
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/mediapipe/MediaPipeSession.kt	Adapter wrapping existing MediaPipe session implementation
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmEngine.kt	New LiteRT-LM engine implementation with caching support
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmSession.kt	New LiteRT-LM session with chunk buffering for MediaPipe compatibility
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt	Updated to use engine abstraction with enhanced synchronization and cleanup
android/src/test/kotlin/dev/flutterberlin/flutter_gemma/engines/EngineFactoryTest.kt	Comprehensive tests for factory engine selection logic
android/src/test/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmEngineTest.kt	Unit tests for LiteRT-LM engine capabilities and lifecycle
android/src/test/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmSessionTest.kt	Unit tests for LiteRT-LM session including thread safety and token estimation
example/lib/models/model.dart	Adds Gemma 3 Nano 2B and 4B LiteRT-LM model variants, fixes local model filename
android/build.gradle	Adds LiteRT-LM SDK dependency (v0.9.0-alpha01)

Comments suppressed due to low confidence (3)

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:142

Resource leak on initialization failure: If newEngine.initialize() at line 129 throws an exception, the newEngine instance created at line 128 is not closed. This could leak resources if the engine constructor allocated any resources before initialization failed. Consider wrapping the initialize call in a try-catch that closes the engine on failure before rethrowing.

        // Create and initialize new engine BEFORE clearing old state
        // This ensures we don't leave state inconsistent on failure
        val newEngine = EngineFactory.createFromModelPath(modelPath, context)
        newEngine.initialize(config)

        // Only now clear old state and swap in new engine (thread-safe)
        synchronized(engineLock) {
          session?.close()
          session = null
          engine?.close()
          engine = newEngine
        }

        callback(Result.success(Unit))
      } catch (e: Exception) {
        callback(Result.failure(e))
      }

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:216

Race condition: Session access is not properly synchronized. Lines 209-211 read the session field outside of synchronization, but the session can be nullified by closeSession() (line 198) or createSession() (lines 184-185) concurrently. This could cause null pointer exceptions or use-after-close errors.

The same issue exists in addQueryChunk (222-224), addImage (235-237), generateResponse (248-250), generateResponseAsync (261-263), and stopGeneration (274-276).

Solution: Wrap the session access in synchronized(engineLock) to ensure consistent access across all methods that read or write to the session field.

  override fun sizeInTokens(prompt: String, callback: (Result<Long>) -> Unit) {
    scope.launch {
      try {
        val currentSession = session
          ?: throw IllegalStateException("Session not created")
        val size = currentSession.sizeInTokens(prompt)
        callback(Result.success(size.toLong()))
      } catch (e: Exception) {
        callback(Result.failure(e))
      }
    }

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:313

Missing synchronization on engine access: The engine field is accessed inside synchronized(engineLock) at line 290, but the streamJob creation at line 292 happens inside that scope while accessing engine flows (lines 294, 306). If the engine is closed or replaced between checking it at line 290 and accessing its flows, this could result in flows from a closed/different engine being collected.

Additionally, streamJob is modified at line 292 without synchronization, but is also accessed in onCancel() at line 317 without synchronization, which could cause race conditions.

  override fun onListen(arguments: Any?, events: EventChannel.EventSink?) {
    // Cancel previous stream collection to prevent orphaned coroutines
    streamJob?.cancel()
    eventSink = events

    synchronized(engineLock) {
      val currentEngine = engine ?: return

      streamJob = scope.launch {
        launch {
          currentEngine.partialResults.collect { (text, done) ->
            val payload = mapOf("partialResult" to text, "done" to done)
            withContext(Dispatchers.Main) {
              events?.success(payload)
              if (done) {
                events?.endOfStream()
              }
            }
          }
        }

        launch {
          currentEngine.errors.collect { error ->
            withContext(Dispatchers.Main) {
              events?.error("ERROR", error.message, null)
            }
          }
        }
      }
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/mediapipe/MediaPipeEngine.kt

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmEngine.kt

android/src/test/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmSessionTest.kt

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmEngine.kt

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/engines/EngineFactory.kt

- Remove non-existent SDK values: unknown, gpuFloat16, gpuMixed, gpuFull, tpu - Add NPU backend support for LiteRT-LM (Google Tensor, Qualcomm) - Simplify backend mapping across all engines - Use Pigeon-generated PreferredBackend directly instead of PreferredBackendEnum - Update tests for NPU backend - Fix Copilot review issues: typo in test comment, error message for missing extension

Copilot

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (5)

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:213

Race condition: The session variable is accessed without synchronization. After reading val currentSession = session on line 206, another thread could call closeSession() (line 191-200) and set session to null, causing the subsequent call to currentSession.sizeInTokens(prompt) to operate on a session that has been closed. The same issue exists in addQueryChunk, addImage, generateResponse, generateResponseAsync, and stopGeneration methods.

The session variable should either be marked as @volatile and accessed within synchronized blocks, or the entire method body should be wrapped in synchronized(engineLock) { ... }. Compare with createSession (lines 168-183) and closeSession (lines 192-199) which properly use synchronized(engineLock).

  override fun sizeInTokens(prompt: String, callback: (Result<Long>) -> Unit) {
    scope.launch {
      try {
        val currentSession = session
          ?: throw IllegalStateException("Session not created")
        val size = currentSession.sizeInTokens(prompt)
        callback(Result.success(size.toLong()))
      } catch (e: Exception) {
        callback(Result.failure(e))
      }
    }

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:227

Race condition: Same session synchronization issue as in sizeInTokens. The session variable is accessed without proper synchronization, allowing another thread to close the session between reading the reference and using it.

  override fun addQueryChunk(prompt: String, callback: (Result<Unit>) -> Unit) {
    scope.launch {
      try {
        val currentSession = session
          ?: throw IllegalStateException("Session not created")
        currentSession.addQueryChunk(prompt)
        callback(Result.success(Unit))
      } catch (e: Exception) {
        callback(Result.failure(e))
      }
    }
  }

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:240

Race condition: Same session synchronization issue. The session variable is accessed without proper synchronization, allowing another thread to close the session between reading the reference and using it.

  override fun addImage(imageBytes: ByteArray, callback: (Result<Unit>) -> Unit) {
    scope.launch {
      try {
        val currentSession = session
          ?: throw IllegalStateException("Session not created")
        currentSession.addImage(imageBytes)
        callback(Result.success(Unit))
      } catch (e: Exception) {
        callback(Result.failure(e))
      }
    }
  }

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:253

Race condition: Same session synchronization issue. The session variable is accessed without proper synchronization, allowing another thread to close the session between reading the reference and using it.

  override fun generateResponse(callback: (Result<String>) -> Unit) {
    scope.launch {
      try {
        val currentSession = session
          ?: throw IllegalStateException("Session not created")
        val result = currentSession.generateResponse()
        callback(Result.success(result))
      } catch (e: Exception) {
        callback(Result.failure(e))
      }
    }
  }

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:265

Race condition: Same session synchronization issue. The session variable is accessed without proper synchronization, allowing another thread to close the session between reading the reference and using it.

  override fun generateResponseAsync(callback: (Result<Unit>) -> Unit) {
    scope.launch {
      try {
        val currentSession = session
          ?: throw IllegalStateException("Session not created")
        currentSession.generateResponseAsync()
        callback(Result.success(Unit))
      } catch (e: Exception) {
        callback(Result.failure(e))
      }
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

android/src/test/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmSessionTest.kt

- Document backend support per platform (Android, iOS, Web, Desktop) - Clarify that CPU is not supported on Web (MediaPipe limitation) - Clarify that NPU is Android-only (.litertlm models) - Add docstrings to PreferredBackend enum in pigeon.dart - Update proto comments for desktop backend options

Copilot

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt:310

Race condition: The engine reference is captured inside a synchronized block but then used outside of it when launching the coroutine. If another thread calls createModel or closeModel between capturing the reference and the coroutine starting to collect from it, the engine could be closed while the coroutine is still collecting from its flows.

Consider capturing the flows (partialResults and errors) inside the synchronized block and collecting from those captured references, or keep the engine reference alive with proper lifecycle management.

    synchronized(engineLock) {
      val currentEngine = engine ?: return

      streamJob = scope.launch {
        launch {
          currentEngine.partialResults.collect { (text, done) ->
            val payload = mapOf("partialResult" to text, "done" to done)
            withContext(Dispatchers.Main) {
              events?.success(payload)
              if (done) {
                events?.endOfStream()
              }
            }
          }
        }

        launch {
          currentEngine.errors.collect { error ->
            withContext(Dispatchers.Main) {
              events?.error("ERROR", error.message, null)
            }
          }
        }
      }
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt

Copilot · 2026-01-24T06:18:55Z

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt

+        val currentSession = session
+          ?: throw IllegalStateException("Session not created")
+        val result = currentSession.generateResponse()


Race condition: Session is accessed without synchronization. If another thread calls closeSession, closeModel, or createSession while this method is executing, the captured session reference could point to a closed session, leading to use-after-close errors.

Consider using synchronized(engineLock) when capturing the session reference to ensure thread-safe access.

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt

Copilot · 2026-01-24T06:18:55Z

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt

+        val currentSession = session
+          ?: throw IllegalStateException("Session not created")
+        currentSession.cancelGeneration()


Race condition: Session is accessed without synchronization. If another thread calls closeSession, closeModel, or createSession while this method is executing, the captured session reference could point to a closed session, leading to use-after-close errors.

Consider using synchronized(engineLock) when capturing the session reference to ensure thread-safe access.

Copilot · 2026-01-24T06:18:56Z

CLAUDE.md

+    ├── EngineConfig.kt            # Configuration data classes
+    ├── EngineFactory.kt           # Factory for engine creation
+    ├── FlowFactory.kt             # SharedFlow factory


Documentation inaccuracy: FlowFactory is documented as a separate file (FlowFactory.kt) but it's actually part of EngineConfig.kt as an object. Update the architecture diagram to reflect that FlowFactory is within EngineConfig.kt, not a separate file.

Suggested change

├── EngineConfig.kt # Configuration data classes

├── EngineFactory.kt # Factory for engine creation

├── FlowFactory.kt # SharedFlow factory

├── EngineConfig.kt # Configuration data classes and FlowFactory object (SharedFlow factory)

├── EngineFactory.kt # Factory for engine creation

android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt

DenisovAV requested a review from Copilot January 23, 2026 18:08

Copilot started reviewing on behalf of DenisovAV January 23, 2026 18:09 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

DenisovAV requested a review from Copilot January 23, 2026 18:53

Copilot started reviewing on behalf of DenisovAV January 23, 2026 18:54 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

android/src/test/kotlin/dev/flutterberlin/flutter_gemma/engines/litertlm/LiteRtLmSessionTest.kt Outdated Show resolved Hide resolved

DenisovAV added 2 commits January 24, 2026 11:07

Fix misleading test: sizeInTokens counts prompt, not accumulated chunks

0432719

DenisovAV requested a review from Copilot January 24, 2026 06:08

Copilot started reviewing on behalf of DenisovAV January 24, 2026 06:09 View session

Copilot AI reviewed Jan 24, 2026

View reviewed changes

Fix CLAUDE.md: FlowFactory is in EngineConfig.kt, not separate file

016e389

DenisovAV merged commit edd01a8 into main Jan 24, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add LiteRT-LM engine support for Android (.litertlm models)#176

Add LiteRT-LM engine support for Android (.litertlm models)#176
DenisovAV merged 5 commits intomainfrom
feature/android-litertlm

DenisovAV commented Jan 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Jan 24, 2026

Uh oh!

Uh oh!

Copilot AI Jan 24, 2026

Uh oh!

Copilot AI Jan 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

DenisovAV commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture

PreferredBackend Changes

Key Files

Known Issues (for future PRs)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DenisovAV commented Jan 23, 2026 •

edited

Loading